Aesthetics & Scales with Pokémon

DSST 289: Introduction to Data Science

Erik Fredner

2024-09-04

Outline

  • Homework
  • Aesthetics & scales: so what?
  • Aesthetics & scales with Pokémon
    • pokemon data
    • geom_point
    • geom_text & label
    • scale_ & limits
    • n.breaks
    • color
    • scale_color
    • size
    • shape
    • facet_ing plots

Homework

  • Share the data visualizations you found online with the people sitting near you.
  • What kind of data would you need to recreate them?

Aesthetics: so what?

Aesthetics (such as color, size, shape, etc.) determine how data points are visually distinguished in a plot.

For example:

Democrats vs. Republicans

Scales: so what?

  • Scales control how data is mapped onto visual dimensions like the x- and y-axes.
  • Proper scaling can prevent misleading representations.
Code
# Load necessary libraries
library(ggplot2)
library(gridExtra)
library(tidyverse)
library(knitr)

# Dummy data
data <- data.frame(
  year = c(2010, 2011, 2012, 2013, 2014, 2015),
  interest_rate = c(3.5, 3.7, 3.6, 3.8, 3.9, 4.0)
)

# Plot 1: With a narrow y-axis
p1 <- ggplot(data, aes(x = year, y = interest_rate)) +
  geom_line(color = "blue", linewidth = 1) +
  geom_point(color = "blue", size = 3) +
  scale_y_continuous(limits = c(3.4, 4.1))

# Plot 2: With a broader y-axis
p2 <- ggplot(data, aes(x = year, y = interest_rate)) +
  geom_line(color = "red", linewidth = 1) +
  geom_point(color = "red", size = 3) +
  scale_y_continuous(limits = c(0, 5))

# Arrange plots side by side
grid.arrange(p1, p2, ncol = 2)

pokemon data

Code
pokemon <- read_csv("../data/pokemon.csv")

# take a look at the data:
pokemon |> 
  head() |> 
  kable()
pokedex_no name form type_1 type_2 stat_total hp attack defense sp_attack sp_defense speed generation
1 Bulbasaur NA Grass Poison 318 45 49 49 65 65 45 1
2 Ivysaur NA Grass Poison 405 60 62 63 80 80 60 1
3 Venusaur NA Grass Poison 525 80 82 83 100 100 80 1
4 Charmander NA Fire NA 309 39 52 43 60 50 65 1
5 Charmeleon NA Fire NA 405 58 64 58 80 65 80 1
6 Charizard NA Fire Flying 534 78 84 78 109 85 100 1

Aesthetics & Scales with Pokémon

The highest defense and hp is in the top-right by default:

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp))

Modifying scales

Let’s suppose we wanted to flip that and see the Pokemon with the highest defense and lowest hp in the top-right corner.

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  # reverse the y-axis
  scale_y_reverse()

Combining scale_, aes, & geom_

Who has low hp and high defense?

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  scale_y_reverse() +
  # new:
  geom_text(aes(x = defense, y = hp, label = name))

Limiting scales

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  scale_y_reverse() +
  # repel the text labels:
  geom_text_repel(aes(x = defense, y = hp, label = name)) +
  # limit the x-axis to `defense` of 150 or more:
  # `NA` ("Not Available") is a missing value indicator.
  # We use it here to say that there is no upper limit on the x-axis.
  scale_x_continuous(limits = c(150, NA))

Increasing n.breaks

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  scale_y_reverse() +
  geom_text_repel(aes(x = defense, y = hp, label = name)) +
  # make it easier to identify the precise values of `defense`:
  scale_x_continuous(limits = c(150, NA), n.breaks = 30)

Color

  • We can use color to see patterns in the data by variables
  • e.g., Are there relationships between type_1, defense, and hp?
  • We’re also going to filter for first generation Pokemon to reduce the number of points.

Color by type_1

Code
pokemon |>
  filter(generation == 1) |>
  ggplot() +
  geom_point(aes(x = defense, y = hp, color = type_1)) +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Custom color

Let’s use colors associated with 🔥, 🍃, and 💧 Pokemon:

Code
pokemon |>
  filter(generation == 1) |>
  filter(type_1 %in% c("Water", "Fire", "Grass")) |>
  ggplot() +
  geom_point(aes(x = defense, y = hp, color = type_1)) +
  geom_text_repel(aes(x = defense, y = hp, label = name)) +
  # use the `type_1` colors instead of the default:
  scale_color_manual(values = c(
    Water = "blue",
    Fire = "red",
    Grass = "green"
  ))

scale_color

Mewtwo has a high stat_total:

Code
pokemon |>
  filter(generation == 1) |>
  ggplot() +
  # color the points by `stat_total` instead of `type1`:
  geom_point(aes(x = defense, y = hp, color = stat_total)) +
  # use the `viridis` color palette instead of the default:
  scale_color_viridis_c() +
  geom_text_repel(aes(x = defense, y = hp, label = name))

size

Magikarp has a low stat_total:

Code
pokemon |>
  filter(generation == 1) |>
  # just water pokemon
  filter(type_1 == "Water") |>
  ggplot() +
  # new: `size` by `stat_total`
  geom_point(aes(x = defense, y = hp, size = stat_total)) +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Combine size and color

Code
pokemon |>
  filter(generation == 1) |>
  # just psychic pokemon
  filter(type_1 == "Psychic") |>
  ggplot() +
  # new: `color` by `stat_total`, too
  geom_point(aes(x = defense, y = hp, size = stat_total, color = stat_total)) +
  # use the `viridis` color palette instead of the default:
  scale_color_viridis_c() +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Combining color and shape

Code
pokemon |>
  # filter for first gen
  filter(generation == 1) |>
  # filter for a few types
  filter(type_1 %in% c("Normal", "Rock", "Bug", "Poison")) |>
  ggplot() +
  geom_point(aes(
    x = defense,
    y = hp,
    # new: shape points by `type_1`
    shape = type_1,
    # color points by `stat_total`
    color = stat_total
  )) +
  scale_color_viridis_c() +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Bonus: facet-ing plots

Code
# faceting allows us to split a plot into multiple panels based on a factor
# maintaining the scales makes them directly comparable
pokemon |>
  # exclude top 1% of stat_total to see better color distribution:
  filter(stat_total < quantile(stat_total, 0.99)) |>
  ggplot() +
  geom_point(aes(x = defense, y = hp, color = stat_total)) +
  scale_color_viridis_c() +
  # new: `~` means "by", so we are saying "facet wrap by `type_1`"
  facet_wrap(~type_1)

Summary

  • Aesthetics determine how data points are visually distinguished, including aspects like color, size, and shape.
  • Scales control how data is mapped onto visual dimensions such as x- and y-axes. Proper scaling ensures that visualizations are easy to interpret and not misleading.
  • Manipulating both aesthetics and scales can reveal patterns and/or outliers in data.
  • Preserving scales on faceted plots can make them directly comparable.